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ABSTRACT 

Random samples are a popular method of summarizing data 
and allow queries posed over the data to be approximated by 
applying an appropriate estimator to the sample. 

Performance on a particular query, however, hinges on es- 
timator selection. The choice of estimators is subjected to 
global requirements, such as unbiasedness and range restric- 
tions on the estimate value, and ideally, we seek estimators 
that are both efficient to derive and apply and optimal in that 
they are not dominated by other estimators. Nevertheless, 
for a given data domain, sampling scheme, and query, there 
are many applicable estimators. In this work, we aim to un- 
derstand and leverage this choice. When the data exhibits 
patterns that we can learn or observe, we can customize es- 
timators to these patterns to obtain better performance while 
maintaining certain guarantees for all possible data in the 
domain. 

Through a case study of coordinated sampling and focus- 
ing on estimators that are variance optimal, nonnegative, and 
unbiased, we set some foundations for the derivation of cus- 
tomized estimators and demonstrate its usefulness for data 
analysis applications. 



> ' 1. INTRODUCTION 

• i-H ■ 

] Random samples are extensively used for scalable analy- 
^ sis of massive data. The samples facilitate approximate pro- 
■ - - ' cessing of queries posed over the original data, when exact 
processing is too resource consuming or when the original 
data is no longer available. Random samples have a distinct 
advantages over other synopsis in their flexibility, in terms 
of supported queries. 

Queries performed over the data range from basic subset 
statistics, such as sums, moments, and averages, and more 
complex relations: distinct counts, size of set intersections, 
and difference norms. The value of a sample hinges on the 
accuracy within which we can estimate query results. In 
turn, this boils down to the estimators we use, which are the 
functions we apply to the sample to produce the estimate. 

We seek estimators that satisfy some global properties, 
which hold for all possible data in our domain. Some global 
properties we consider are 



• Range restriction of estimates: since the estimate is often 
used as a substitute of the true value, we would like it to 
be from the same range as the query result. Some common 
restrictions are nonnegativity (when the estimated range is), 
or boundedness which means that the range of estimates is 
bounded by some function of the query result. 

• Unbiasedness, which means that the expectation of the es- 
timate is equal to the query result. Unbiasedness is particu- 
larly important when we estimate a sum aggregate by sum- 
ming estimates of components, and wish the relative error to 
decrease with aggregation. 

• Finite variance (implied by boundedness but less restric- 
tive) 

Perhaps the most basic quality measure of an estimator 
is its variance. The variance, however, depends on the in- 
put data set, and in general there is no single estimator that 
is optimal for all data values in our domain. We therefore 
aim for variance optimality - meaning that strict improve- 
ment is not possible without violating some global proper- 
ties. More precisely, there is no estimator with at most the 
variance of our estimator on all data and strictly lower vari- 
ance on some data. We also consider variance competitive- 
ness Is] - meaning that the variance on each data is not too 
far off the best possible for the data subject to the global 
properties. Variance competitiveness provides "worst case" 
performance guarantees over data in our domain. It may not 
always be desirable or achievable but is useful when we need 
some guarantees for all possible data. 

We treat the problem of deriving estimators from an opti- 
mization perspective, looking for estimators that satisfy global 
properties and optimality but at the same time, are tailored to 
perform better when the data follows some stated patterns. 
The first component is to understand the set of applicable 
estimators, those satisfying the global properties we are af- 
ter and (Pareto) optimality. We then hope to leverage the 
freedom we have in estimator selection to derive customized 
estimators that performs better on recurring patterns we can 
learn or observe in the data, such as sparsity of certain rela- 
tions between entries. 

We explore this process for coordinated shared-seed sam- 
pling IIliniiniiniElllllllElIll (see (H m for overview). 
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The sampling scheme is widely applicable and thus the de- 
velopment of good estimators is independently interesting. 
It is also sufficiently complex to be technically challenging 
and yet simple enough to allow for a comprehensive under- 
standing. In particular, variance competitiveness is attain- 
able |8 1 and there is a complete characterization of functions 
for which estimators with certain global properties exist lH. 

Example 1 Dataset with 3 instances and queries 

Instances i G {1, 2, 3} and keys k £ {a, b, c, d, e, /, g, h}: 





a 


b 


c 


d 


e 


f 


g 


h 


1 


0.95 





0.23 


0.70 


0.10 


0.42 





0.32 


2 


0.15 


0.44 





0.80 


0.05 


0.50 


0.20 





3 


0.25 








0.10 





0.22 









Example queries over selected keys H C [a-h]. Lp dif- 
ference, LP, which is the pth power of Lp difference and a 
sum aggregate which can be used to estimate the Lp differ- 
ence, ip_|_: asymmetric (increase only) L^, the sum of the 

increase-only and the decrease-only changes (decrease only 
is obtained by switching the roles of vi and V2) is L^, but 
each component is a useful metric for asymmetric change. 
G an "arbitrary" sum aggregate, illustrating versatility (work 
applicable to any function for which estimators exist) 



keH 
keH 



sum aggregate 


tuple function 




RGp(i;) = (max(i)) — imniv))^ 




RGp+ (-ui , 1)2 ) = max{0, v-i — V'zY 


G 


g{v-i,V2,vs,) = \vi + V3 — 2f2|^ 



Li{{b, c,e}) =|0 - 0.44| + |0.23 - 0| + |0.10 - 0.05| = 0.71 
L2{{c, /. h}) =(0.23 - 0)^ + (0.50 - 0.42)2 ^ 32 _ q)2 ^ g -^g 

L2{{c, f, h}) =^L2({3,6,8}) ^ 0.40 
Ll+ ({b, c, e}) = max{0, - 0.44} + max{0, 0.23 - 0}+ 
+ max{0, 0.10 - 0.05} = 0.235 
G{{b,d}) =|0 - 2 * 0.44 + Op + |0.7 - 2 * 0.8 + 0.1|2 1.18 



Coordinated sampling is particularly useful when data sets 
have the form of multiple instances, which are weight as- 
signments over the same set of keys. Different instances 
may correspond to snapshots, activity logs, measurements, 
or repeated surveys at different times or locations. Coordi- 
nation of the samples of different instances allows for more 
accurate estimates of aggregates such as distinct counts and 
similarity measures. Such aggregates often either have the 
from of a sum over keys of a "tuple" function applied to the 
values of the key in the different instances, or otherwise can 
be expressed as functions of such sums: Distinct count is 
a sum aggregate of logical OR and the Lp difference is the 



pth root of LP, which sum-aggregates exponentiated range 
functions RGp{v) = {max{v) — mm{v))P (p > 0). Exam- 
ple [T] illustrates a data set , example aggregate queries, and 
respective tuple functions. 

We focus on estimating "tuple" functions defined over the 
tuple of the values of a single key. Sum aggregates can be 
estimated by summing up tuple estimators. Tuple estimates 
typically have high variance, since most of all of the entries 
are missing from the sample. We therefore insist on unbiased 
and (at least) pairwise independence of the tuple estimates, 
which allow the relative error to decrease with aggregation 
m. Since the tuple functions we are interested in are non- 
negative, we also require the tuple estimates to be nonnega- 
tive |8 | (results extend to any one-sided range restriction on 
the estimates). 

The pointwise-optimal range: We express the range of es- 
timators that are variance^ optimal, where we define variance"*" 
optimality as variance optimality over unbiased nonnegative 
estimators. The pointwise optimal range of estimates (Sec- 
tion [3]) is defined for an outcome, conditioned on estimate 
values on all "less-informative" outcomes, and includes the 
range of estimate values that are optimal with respect to at 
least one data vector that is consistent with the outcome. We 
show that being "in range" almost everywhere is necessary 
for variance^ optimality and sufficient for unbiasedness and 
nonnegativity, when an unbiased nonnegative estimator ex- 
ists for the function /. 

The L and U estunators: The lower extreme of the point- 
wise optimal range is the L estimator (Section|4|, which has 
a compelling combination of properties. It satisfies both our 
quality measures, being variance"*^ optimal and 4-competitive. 
The L estimator is monotone, meaning that when fixing the 
data vector, the estimate value is monotone non-decreasing 
with the information we can glean from the outcome (set of 
vectors consistent with our sample). In fact, the L estima- 
tor is the unique variance^ optimal monotone estimator and 
thus dominates (in terms of variance pointwise) the Horvitz- 
Thompson estimator 1 10] (which is also unbiased, nonnega- 
tive, and monotone), when the latter is applicable. We show 
that the competitive ratio of 4 of the L estimator is tight in 
the sense that there is a family of functions on which the 
supremum of the ratio, over functions and data vectors, is 4. 
The ratio can be lower, however, for specific functions. Fi- 
nally, we give a simple expression for the L estimator which 
allows it to be efficiently computed by numeric integration 
or a closed form. 

The upper extreme of the pointwise optimal range is the 
U estimator (Section|6]), which is unbiased, nonnegative, and 
has finite variances. We show that under some conditions on 
/ that are satisfied by natural functions including the expo- 
nentiated range, the U estimator is also variance"*" optimal. 

Order-based optimality: One notion useful for customiza- 
tion is order optimality |6|: An estimator is ^+-optimal 
with respect to some partial order -< on data vectors if any 
other (nonnegative unbiased) estimator with lower variance 
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on some data v must have strictly higher variance on some 
data that precedes v. Order-based optimaHty impHes vari- 
ance optimaUty, but not vice versa. By specifying an order 
which prioritizes more Hkely patterns in the data, we can 
customize the estimator to these patterns. 

We show (Section|5]) how to construct a ^+ -optimal non- 
negative unbiased estimators for any function and order -< 
for which the estimator exists. We show that when the data 
domain is discrete, such estimators always exist whereas 
continuous domains require some natural convergence prop- 
erties of ^. Moreover, the L estimator is -^+-optimal with 
respect to z ^ v f{z) < f{v). The U estimator, 

under some conditions, is ^+ -optimal with respect to the 
reverse order 

Practical implications: In [TJ, we evaluate Lp difference 
estimators derived as pth roots of sums of our L and U esti- 
mators for exponentiated range functions RGp {p > 0). We 
apply the Li and L2 estimators to samples of data sets with 
different characteristics: IP flow records exhibited larger dif- 
ferences between bandwidth usage assumed by a flow key 
(IP source destination pair, port, and protocol) in different 
times. The surnames dataset (frequencies of surnames in 
published books in different years) had more similar val- 
ues. Accordingly, the U estimator, which is optimized for 
large differences dominated on the IP flow records dataset 
whereas the L estimator dominated on the surnames dataset. 
This demonstrates the potential value in selecting a custom 
estimator 

More generally, the study shows that we obtain accurate 
estimates even when only a small fraction of entries is sam- 
pled, using either estimator, and also demonstrate the value 
of competitiveness of the L estimator: Whereas it can out- 
perform the U estimator by orders of magnitude, it can be 
outperformed only by a small factor. 

2. PRELIMINARIES 

We briefly summarize model and concepts on estimator 
optimality [6| and coordinated shared-seed sampling |8|. 

Sampling model: The data is a vector v = {vi,v2, ■ ■ ■ ,Vr) E 
V — y , where V is some subset of the reals. The sampling 
scheme we use is specified by continuous functions t — 
(ti, . . . , Tr) on [0, 1] with range containing (min V, max V). 
The output of the sampling is the outcome S = S{u, v), 
which depends on the data and a random seed u e J7[0, 1]. 
We treat the outcome as a set where the ith entry is included 
in S if and only if Vi is at least (u): 

i e 5 Vi > Ti(u) 

but also assume that the seed u is available with it. A special 
case of particular interest is PPS sampling, when (u) are 
linear functions: There is a fixed vector t* such that (u) = 

UT*. 

With each outcome S{u^ v), we can identify the set V* (S) 



of all data vectors that are consistent with it: 

V*{S) = V*{u,v) = 

{z \ e [r],i e S A Zi = Vi V i ^ S A Zi < Ti{u)} . 

The set V* {u, v) is increasing with u, which means we have 
less information on the data when u is larger For two differ- 
ent outcomes. Si and ^2, the sets V*{Si) and V*{S2) must 
be either disjoint or one is contained in the other 

For any two vectors, the set of u values on which the out- 
comes S{u, v) and S{u, z) are the same is a suffix of (0, 1] 
that is open to the left: 

Vpe(0,l]Vt;, (1) 
z e V*{p,v) =^ 3e > 0, Vx e (/9-e,l], z e V*ix,v) 

Example 2 Coordinated PPS sampling for Example[T] 
Consider shared-seed coordinated sampling, where each of 
the instances A,B,C is PPS sampled with threshold t* = 1. 
In this particular case, each entry is sampled with probabil- 
ity equal to its value. To coordinate the samples, we draw 
£ U[0, 1], independently for different keys. A key k is 
sampled in instance i if and only if vl'^' > u'^''\ y*(S'*^'')) 
contains all vectors consistent with the sampled entries and 
with value at most u'^'^^ in unsampled entries. 



key 


a 


b 


c 


d 


e 


f 


g 


h 


1 


0.95 





0.23 


0.70 


0.10 


0.42 





0.32 


2 


0.15 


0.44 





0.80 


0.05 


0.50 


0.20 





3 


0.25 








0.10 





0.22 










0.32 


0.21 


0.04 


0.23 


0.84 


0.70 


0.15 


0.64 



The outcomes for the different keys are: S'^"'^ = (0.95, *, 
Sib) = 0.44, S't'^) = (0.23, *, *), S'('') = (0.7, 0.8, *), 
= S-^/) = S'^''^ = (*,*,*), S^s) ^ (*,0.2,*). The 
sets of vectors consistent with the outcomes are 
{0.95} X [0,0.32)2 and y*(S'('*)) ^ [o,o.64)^ 



Estimators: We are interested in estimating, from the out- 
come, a function / : V which maps V to the nonnegative 
reals. We apply an estimator f to the outcome (including the 
seed) and use the notation f{u, v) = f{S{u, v)). When the 
domain is continuous, we assume / is (Lebesgue) integrable. 

Two estimators fi and /2 are equivalent if for all data v, 
fi{u, v) — f2{u, v) with probability 1, which is the same as 

/i and /2 are equivalent <;==> VvVp G (0, 1], (2) 

fi{u,v)du r f2{u,v)du 
lim — ■ — lim — ■ . 

An estimator / is nonnegative if V5, f{S) > and is 
unbiased if Vv , B[f\v] = f{v). An estimator has ^n/fe 
variance on v if f{u, v)^du < 00 (the expectation of the 
square is finite) and is bounded on v if sup„g(Q f{u, v) < 
(X. If a nonnegative estimator is bounded on v, it also has 
finite variance for v. An estimator is monotone on v if when 
fixing V and considering outcomes consistent with v, the es- 
timate value is non decreasing with the information on the 
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data that we can glean from the outcome, that is, /(u, v) is 
non-increasing with u. We say that an estimator is bounded, 
has finite variances, or is monotone, if the respective prop- 
erty holds for all v G V. 

The lower bound function. For Z c V, we define f{Z) — 
inf{/(u) \ V E Z} as the infimum of / on Z. We use 
the notation = l{V*iS)), f_{p,v) = l{V*{p,v)). 

When V is fixed, we use (u) = /(w, v). Some properties 
which we need in the sequel are HI? 

•\/v,f}'"\u) is 

montone non increasing and left-continuous. 

•/ is unbiased and nonnegative 

Vt;,Vp, [' f{u,v)du<f^^\p). 



(3) 

(4) 
(5) 



The lower bound function /^"^ and its lower hull H^^\ 
are instrumental in capturing existence of estimators with 
desirable properties [8J: 



•3 unbiased nonnegative / estimator 

yv e V, lim /'"H") = f{v) . 
M^o^ — 



(6) 
(7) 



•If / satisfies (Q, 

3 unbiased nonnegative estimator with finite variance for v 

^l[^-^yu<oo. (8) 
3 unbiased nonnegative estimator that is bounded on v 



lim 

M-S-0 + 



fiv)-f^^\u) 



< oo 



(9) 



Example [3] illustrates the lower bound functions and respec- 
tive lower hull for RGp+. 

Partially specified estimators. We use partial specifica- 
tions f of (nonnegative and unbiased) estimators, which are 
specified on a set of outcomes S so that 

Vi) 3pv e [0, 1], S{u,v) G Sa.e. for u > A 
S{u, v) ^ S a.e. for u < . 

When pi, — 0, we say that the estimator is fully specified for 
V. We also require that / is nonnegative where specified and 
satisfies 

\/v,p^>0 =^ f f{u,v)du<f{v) (10a) 

Jpv 

^v,p„^0 =^ f f{u,v)du^ f{v) . (10b) 

Lemma 2.1. If f satisfies (|7]l (has a nonnegative unbi- 
ased estimator), then any partially specified estimator can 
be extended to an unbiased nonnegative estimator 

7;-optimal extensions and estimators. Given a partially 
specified estimator / so that (0„ >OandAf = f{u,v)du. 



a v-optimal extension is an extension which is fully specified 
for V and minimizes variance for v (amongst all such ex- 
tensions). The v-optimal extension is defined on outcomes 
S{u, v) for u G (0, pv] and satisfies 



min / f{u,vydu (11) 
Jo 

s.t. / f{u,v)du = f{v) - M 
Jo 

Vu, I ' /(x, v)dx < l^^Xu) - M 

J U 

For G (0,1] and M G [Q , f}'"\p^)], we define the 
function /(■".p^'.a^) : (Q, p„] ^ as the solution of 

^ ' 0<,i<u 



P-TJ 

(12) 

Geometrically, the function fi'"'P^-'^'^) is the negated deriva- 
tive of the lower hull of the lower bound function /'"^ on 
{0, Pv) and the point {py,M). 

Theorem 2.1. Given a partially specified estimator f 
so that py > and M = f(u,v)du, then /(".p-.^) 
is the unique (up to equivalence) v-optimal extension of f. 

The v-optimal estimates are the minimum variance exten- 
sion of the empty specification. We use = 1 and M ~ Q 
and obtain = /(''.i^o) jj^g solution of 



0<?;<« 



p-r] 



(13) 



which is the negated slope of the lower hull of the lower 
bound function Z*^"-'. This is illustrated in Example[3] 

Variance+ and order-based optimality. An estimator is 
variance^ -optimal if there is no nonnegative unbiased esti- 
mator with same or lower variance on all data and strictly 
lower on some data. We also consider order-based optimal- 
ity with respect to a partial order -< on V: An estimator / 
is -optimal if there is no other nonnegative unbiased es- 
timator with strictly lower variance on some data v and at 
most the variance of / on all vectors that precede v. Order- 
based optimality (with respect to some -<) implies variance 
optimality but the converse is generally not true |6 |. 

Variance competitiveness An estimator / is c-competitive 
if 



Vv, 



du < cinf 
/' 



f{u, v) du, 



where the infimum is over all unbiased nonnegative estima- 
tors of /. When the estimator is unbiased, the expectation 
of the square is closely related to variance, and an estimator 
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Example 3 Lower bound function and its lower hull 



Consider RGp+(ui,W2) = max{0, wi — 112}^ (see Example[T]i over the domain V = [0, l]'^ andPPS sampling with rj* = r| = 1 



(as in Example|2]i. The lower bound function for data v ~ (t^i, W2) is 



RGp-)_(w, v) — max{0, vi — max{i;2, u}}^ 



The figures below illustrate RGp^'^'"'' (u) and its lower hull for the data vectors (0.6, 0.2) and (0.6, 0) and p = {0.5, 1, 2}. For 
u > 0.2, the outcome when sampling both vectors is the same, and thus the lower bound function is the same. For u < 0.2, 
the outcomes diverge. For p < 1, RGp_|_*^''^ (m) is concave and the lower hull is linear on (0, vi]. For p > 1, the lower hull 

coincides with RG„4.('"^ (u) on some interval (a, vi] and is linear on (0, a]. When V2 — 0, RGp+('')(M) is equal to its lower hull. 



R(3p+ p=0.5, PPStau=1. LB GH 



RGp+ p=1, PPStau=1, LB CH 



RGp+ p=2, PPS tau=1 , LB CH 





v1=0.6v2=0LB,CH 
Vl=0.6v2=0.2 LB 
Vl=0.6u2=0.2 CH 




The t)-optimal estimates are the negated slopes of the lower hulls. They are when u € (0.6, 1], since these outcomes are 
consistent with data on which RGp+ — 0. They are constant for u e (0, vi] when p < I. Observe that for u e (0.2, 0.6], the 
v-optimal estimates are different even though the outcome of sampling the two vectors us the same - demonstrates that it is 
not possible to simultaneously minimize the variance of the two vectors. 



that minimizes one also minimizes the other. 



WAR[f\v] 



f{u,vydu-fivy 



(14) 







3. THE POINTWISE OPTIMAL RANGE 

We say that an estimator / is v-optimal at an outcome 
S{u, v) if it satisfies dTsl l. For an outcome S{p, v), we are 
interested in the range of 2;-optimal estimates at S for all 
z G V*{S), with respect to a value A/, which captures the 
contribution to the expectation of the estimator made by out- 
comes which are less informative than S. 



X{p,v,M)^ inf 



f{r,,v)~M 



(15) 



0<?7<p P — 1] 

Xu{p,v,M) = Xu{S,M)^ sup X{p,z,M) (16) 

zev*{p,v) 

Xl{p,v,M) = XLiS,M) ^ inf X{p,v,M) 

zev{p,v) 

= mi mi = 

zeV'ip.v) o<n<p P ^ V 

_ lip,v)-M 
P 



(17) 



To verify equality ( fTTI i. observe that from left continuity of 

rj<p, zGV'iS)— — 

and that the denominator p — ?/ is maximized at ?/ = 0. 
X{p, V, M) is the t>-optimal estimate at p, given a specifica- 



M- 



— 


\ \ — ^ 




— w 




^ 1 



Figure 1: Lower bound functions for vectors v,z,w. 
Outcomes are consistent for all a; > u: S{x,v) = 
S{x,z) — S{x,w) = Sx- The figure illustrates the 
jZ-optimal estimates X{u,y,M) at u given M for y e 
{v, z, w}. The estimates are the negated slopes of the 
lower hull of the point {u,M) and the lower bound func- 
tion /'^^. The optimal range at Su given M is lower- 
bounded by w, that is Xl{Su,M) = X{u,w,M), and 
upper-bounded by v, XuiSu,M) — X{u,v,M). The 
figure illustrates the general property that the optimal 
range is lower bounded by the w which satisfies f{w) = 
f{w,u). 
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tion of the estimator f{u, v) for u G (p, 1] with f{u, v)du — 
M. In short, we refer to X{p, v, M) as the v-optimal esti- 
mate at p given M. Geometrically, A(/9, f , M) is the negated 
slope ofthe lower hull of Z^''' and the point (p, M). \u{S, M) 
and Xl{S, M), respectively, are the supremum and infimum 
of the range of 2;-optimal estimates at S given AI. Figure [T] 
illustrates an outcome S and the optimal range at S given 
M. We can see how the lower endpoint of the range is real- 
ized by a vector with / value equal to the lower bound at S, 
as in equality (T% . 

When / is given for u G (p, 1], we use M = f[u, v)du = 
and abbreviate the notations by removing M io \{p,v), Xu (S), 
and Xl{S). 

We say that the estimator / is in-mnge (in the optimal 
range ) at outcome S{p^ v) if 



\L{S)<f{S)<Xu{S) 
Writing (fTsT i expUcitly, we obtain 



(18) 



f{p,v) < Xu{p,v) 



lip, - Ip ■")du 



(19a) 



fiv, z) - f{u, v)du 

sup inf = ^ (19b) 

5ei^*(S) o<')<p P - »? 



Two special solutions that we study are the L estimator 
(/^■^^ see Section m and the U estimator i f^^K see Sec- 
tion|6]), which respectively solve (|19a| i and ( |19b| i with equal- 
ities. For all p £ (0, 1] and v, /^^' minimizes and /('^) 
maximizes fiu. v)du among all solutions of (fTsTl. 

We show that being in-range (satisfying ( fTSl l for all out- 
comes S) is sufficient for nonnegativity and unbiasedness. 



Lemma 3.1. If f satisfies ^ then any in-range estima- 
tor is unbiased and nonnegative. 

Proof. For nonnegativity, it suffices to show that a so- 
lution of ( fTSl l satisfies (|5]l, since (I19ab and (|5]l together im- 
ply nonnegativity. Assume to the contrary that a solution 
/ violates (|5]l and let p be the supremum of x satisfying 
jf{u,v)du > f{x,v). From ([3]), which is monotonic- 
ity and left-continuity of f{x, v), we have f{u, v)du — 

f{p,v). Since f(u,v)du is continuous in x, and f^'"^ 
left-continuous, there must be 5 > so that 



Wxe[p-S,p)J f{u,v)du> lix,v) . (20) 

J X 

Let x G [p — 5^p) and M {x) = f{u, v)du. From (|20l l. 



M{x) > f{x,v). We have that 
f{x,v) < 



sup 

&V*{x,v) 0<V<x 



. f{,j,z)-Mix) 
mi = 



X — T] 



< 



< 



sup 



. , fir],z) - f{x,v) 
mi = = 



sup lim 



X — 7] 

fiv,z) - f{x,v) 



= lim 

ri^x^ 



X — 1] 

fiv,v)-f{x,v) df{x,v) 



X — T] dx 

Since this holds for all x G {p — S, p), we obtain that 
Ip-s fi^, '")dx < lip-S, v)-f{p, v). Therefore, /(x, v)dx < 
f{p — 6, v), which contradicts ( |20l i. 

We now establish unbiasedness, using ( |19al ), and f{u, v) 
being non increasing in u, we obtain that VuVp > u, 

-1 



fiu,v) > 



> 



liu,v) -/^ f{x,v)dx 
u 

lip, v) - fix, v)dx 



(21) 



We argue that 



VvV/9>0, lim / f{u,v)du> f{p,v) . (22) 

To prove (l22T l. define A(a;) ~ f{p,v) — f{u,v)du for 
X G (0, p]. We show that J^^^^ f{u, v)du > A(a;)/4. To see 
this, assume to the contrary that f{u, v)du < A(x) /4 for 

ally G [a;/2,a;]. Then from (12111. the value of f(u. v) for u G 
[x/2, x] must be at least {3/4:)A{x)/x. Hence, the integral 
over the interval [x/2,x] is at least (3/8)A(a;) which is a 
contradiction. We can now apply this iteratively, obtaining 
thatA(p/2') < (3/4)'A(p). Thus, the gap A (x) diminishes 
as a; — > and we established (l22l l. 

Since ( |22] ) holds for all p > 0, then lim„_j.o fiu, v)du > 
lini„_j.o f{u, v) — f{v) (using (|7|i). Combining that we 

already established (|5]) we obtain limu^o fiu, v)du = 
fiv). □ 

We next show that being in-range is necessary for optimal- 
ity. For our analysis of order-based optimality (Section |5]l, 
we need to slightly refine the notion of variance+-optimality 
to be with respect to a partially specified estimator / and a 
subset of data vectors Z C V. 

An extension of / that is fully specified for all vectors 
in Z is variance+-optimal on Z if any other extension with 
strictly lower variance on at least one v £ Z has a strictly 
higher variance on at least one z G Z. We say that a partial 
specification is in-range with respect to Z if: 

Vv G Z, for p G (0, pv] almost everywhere, 

inf X{p,z) < f{p,v) < sup A(p,2;) (23) 

zeznv*{p,v) zezr\V*{p,v) 
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Using (|2|l, $23[ is the same as requiring that Vv Vp G (0, pv], 
when fixing the estimator on S{u, v) for u > p, then 

f f{u, v)du 

inf A(p, z) < lim — < sup A(p, z) 

(24) 

We show that a necessary condition for variance+-optimality 
with respect to a partial specification and Z is that almost ev- 
erywhere, estimates for outcomes consistent with vectors in 
Z are in-range for Z. Formally: 

Theorem 3.1. An extension is variance^ -optimal on Z 
only if (123b holds. 

Proof. Consider an (nonnegative unbiased) estimator / 
that violates ( |23] l for some v € Z and p. We show that there 
is an alternative estimator, equal to f{u, v) on outcomes u > 
p and satisfies (l23T l at p that has strictly lower variance than 
/ on all vectors Z V*{p, v). This will show that / is not 
variance+-optimal on Z. 

The estimator / violates (l24l l. so either 

f f(u, v)du 

lim ^ < inf A(p, z)=L (25) 

n^p^ P ^ V zeznV'{p,v} 

or 

f f(u, v)du 

lim > sup \{p,z) = U . (26) 

j;->p p — rj zeznv{p,v) 

Violation (l26T l. for a nonnegative unbiased /, means that 
M = f{u, v)du < /(u, v). Consider z e Z r\V*{p, v) 

and the 2;-optimal extension, f(^'P'^'^) (see Theorem 12. 11 1. 
Because the point {p, M) lies strictly below f^^\ the lower 

hull of both the point and f'^^^ has a linear piece on some in- 
terval with right end point p. More precisely, /(^'''■*^) (u) = 
X{p, z, M) on S{u, z) at some nonempty interval u g (77^, p] 
so that at the point 77^, the lower bound is met, that is, M + 
{p - riz)Hp, z, M) = lim^^^+ /(u, z). Therefore, all ex- 
tensions (maintaining nonnegativity and unbiasedness) must 
satisfy 

r f{u, z)du < lim /(m, z) - M (27) 
= (p-r;,)A(p,^,Af) < {p-ri,)U . 

From ( l26b . for some e > 0, / has average value strictly 
higher than U on S{u, v) for all u in intervals (77, p] for 
77 G [p — e,p). For each z G we define as 

the maximum of p — e and inf (ti, t>) = 2;)}. 
From ([U, < p. For each z, the higher estimate values on 
S{u, z) for u G (Cz , p] must be "compensated for" by lower 
values on u G {rjzXz) (from nonnegativity we must have 
Vz < Cz) so that dZTb holds. By modifying the estimator 
to be equal to U for all outcomes S{u, v) u € {p — e, p] 
and correspondingly increasing some estimate values that 



are lower than U to U on S{u, z) for u £ {rjz, Cz) we ob- 
tain an estimator with strictly lower variance than / for all 
z G Z nV*{p, v) and same variance as / on all other vec- 
tors. Note we can perform the shift consistently across all 
branches of the tree-like partial order on outcomes. 

Violation (|25] ) means that for some e > 0, / has aver- 
age value strictly lower than L on S{u, v) for all intervals 
u G (77, p] for r; G [p — e, p). For all z, the 2;-optimal exten- 
sion f'^^'P''^^\u) has value X{p,z,M) > L at p and (from 
convexity of lower hull) values that are at least that on u < p. 
From unbiasedness, we must have for all 2; G Z nV* {p,v), 
Jo /('^' z}du = f^^-'P-'^^') {u)du. Therefore, values lower 
than L must be compensated for in / by values higher than 
L. We can modify the estimator such that it is equal to L for 
S{u, v) for u £ {p — e, p) and compensate for that by low- 
ering values at lower u values u < that are higher than 
L. The modified estimator has strictly lower variance than / 
for all 2; G ZnV* (p, v) and same variance as / on all other 
vectors. □ 



4. THE L ESTIMATOR 

The estimator /^^^ satisfies ( |19a| i with equalities, obtain- 
ing values that are minimum in the pointwise optimal range. 
Its values on outcomes consistent with data v only depend 
on the lower bound function on outcomes consistent with v. 
Geometrically, as visualized in Figure |2] the L estimate on 
an outcome S{p, v) is exactly the slope value that if main- 
tained for outcomes S{u, v) (u G (0, p]), would yield an 
expected estimate of f{S). We derive a convenient expres- 



LB function 




u 

Figure 2: An example lower bound function f^'"\u) 
with 3 steps and the respective cummulative L estimate 
lu f^^Ku,v)du. The estimate /^^^ is the negated slope 
and in this case is also a step function with 3 steps. 

sion for this estimator and show that it is 4-competitive and 
that it is the unique variance^ optimal monotone estimator. 
We also show it is order-optimal with respect to the natural 
order that prioritizes data vectors with lower f{v). 

The L estimator is the solution of the integral equation 
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e V, Vp G [0, 1]: 



(28) 



We solve the integral equations to obtain an explicit form 
of the estimator Z^^-*. 

Lemma 4.1. 

(29) 



1 

— ^ — du (30) 

u 

Proof. We show that (|29l) is a solution of (|28]i. To ease 

(x) 

calculations, we use h{x) = j — dx. We have 



We need to show 



h{l) + h{p) 



(31) 



t\p) ^ Pf'^'-Kp^v) + I f^^Hx,v)dx 
Substituting (ISTT l we get 

/'"Hp) = /'"'(p)-p(/»(i)-'^(p))+ [\^--^-h{i) + h{x))dx 
- - Jp X 

p(h(l) - /i(p)) = / {= — + h{x)dx - (1 - p)/i(l) 

Rearranging and canceling identical terms 

h{l) ~ ph{p) = / - - h{x) dx (32) 



Using {h{x)x)' — xh' {x)+h{x), and substituting ft,' (x) a; = 
/("^ (x) /x we obtain that h{x)x = J (/^''^ (x) /x + h{x))dx. 
Substituting in ( |32] | we get that we need to show 

/i(l) - pft(p) /i(x)x|i = /i(l) - pft(p) 
Lastly, the lower bound function /^"^(u) is monotone on 



(0, 1] and thus differentiable almost everywhere. Thus, — ^j— 
is defined almost everywhere. We get (l30l l from ( l29t using 
integration by parts. □ 

We show a tight bound of 4 for the competitive ratio for 
f^'^\ meaning that it is at most 4 for all functions / and for 
any e > 0, there exists a function / on which the ratio is no 
less than 4 ~ e. 



Theorem 4.1. 



j'J^^\u,vfdu 
f.v I fi-)\uYdu<oo /o^ f^-"^{uYdu 



sup 



= 4, 



We first present a family of functions for which the super- 
mum of this ratio is 4. Consider the domain [0, 1]^, t{u) = 



for a < 0.5. For the vec- 



u, and f{v) = - 
tor V = (0,0) we have the following convex lower bound 
function 



1 



,1-a 



1 — a 1 — a 

Alternatively, the same lower bound function also corresponds 

to data t) = (1, 0) and the function 5 (t)) = T3^~~u^7; — • 
Being convex, this lower bound function is equal to its 
lower hull. Therefore, by taking its negated derivative, we 
get f^^'^\u) = The function /(*'^") is square inte- 

grable when a < 0.5: 



l/u^°'du 



1 



1 - 2a 



From ( [30l l. the L estimator on outcomes consistent with v is 



/(^)(x,t;) = 

Hence, 

fi 

f^^\u,vfdu : 



a\x 



1 



1 



1 

JO 

2 

a 



1 



1 



1 \du 



(1 - 2a)(l - a) 



1 — a 



< 4 , 



a" \ 1 - 2q; 
We obtain the ratio 

S'J^^\u,vfdu 
Slf^-\u)^du 

The ratio approaches 4 when a — > 0.5^. 

We conclude the proof of Theorem |4.1| using the following 
lemma that shows that if /'"^ [u) is square integrable, that is, 
(EJ holds, then v) is also square integrable and the 

ratio between these integrals is at most 4. 



Lemma 4.2 

r-l 



J^'"\uYdu < oo 



j^f(^Hu,vrdu 



< 4 



/g /('')(u)2dM 

Proof. The function f^'"'> only depends on the lower hull 
of the lower bound function /^"^ (u). The estimator /(^' de- 
pends on the lower bound function / and can be different 
for different lower bound functions with the same lower hull. 
Fixing the lower hull, the variance of the L estimator is max- 
imized for / such that = H^f"\ It therefore suffices to 



consider convex /('')(u),that IS. 
have 



du 



> for which we 



Recall that f'''"\u) is monotone non-increasing. From (l30b . 

fW[p^v) = -!l=^du. 
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To establish our claim, it suffices to show that for all mono 
tone non increasing functions g, 

Jo gi^ydx 

Define hix) = ^du. 



(33) 



1 fi fi 

1 



h {x)dx 



2h{y)h'{y)dydx 



j j 2h{y)h'{y)dxdy 
2 [ h{y)h'{y) r dxdy 



2 j h{y)h'{y){y-e)dy 

2 [\{y)^{y-e)dy<2 f\{y)g{y)dy 

Je y Je 



<2^ J\^{y)dy^ J\^y)dy 
The last inequality is Cauchy-Schwartz. To obtain 



we divide both sides by y h'^{y)dy and take the limit as 
e goes to 0. 



□ 



Theorem 4.2. The estimator f^^^ is monotone. More- 
over, it is the unique variance'^ -optimal monotone estimator 
and dominates all monotone estimators. 

Proof. Recall that an estimator / is monotone if and 
only if, for any data v, the estimate /(p, v) is non-increasing 
with p. To show montonicity of the L estimators, we rewrite 
to obtain 

= f{p) + C M^Mddx , (34) 



which is clearly non-increasing with p. 

We now show that /(^' dominates all monotone estima- 
tors (and hence is the unique variance+-optimal monotone 
estimator). By definition, a monotone estimator / must sat- 
isfy the inequalities Vv, Vp G [0, 1]: 

PfiP,v)+ f{u,v)du< inf / f{u,z)du = 
Jp zev*(p,v)jQ 

inf f{z)^f{p,v). (35) 
zev*{p,v) — 

Estimator /(^' satisfies ( l35l ) with equalities. If there is a 
monotone estimator / with an inequality which is not tight, 
we can obtain a monotone estimator that strictly dominates 
/ by decreasing the estimate for u < p and increasing it 
for u > p. The variance decreases because we decrease the 
estimate on higher values and increase on lower values. □ 



Lastly, we show that /^^^^ is order-based variance optimal 
with respect to the order -< which prioritizes vectors with 
lower f{v): 

Theorem 4.3. A -optimal estimator for f with re- 
spect to the partial order 

V -<v' ^ f{v) < f{v') 

must be equivalent to f^^\ 

Proof. We use our results of order-based optimality (Sec- 
tion|5l). We can check that we obtain ( |28] ) using (l42l) and -< 
as defined in the statement of the Theorem. Thus, a 
optimal solution must have this form. □ 

The L estimator may not be bounded (see Example |4|. 
An estimator that is both bounded and competitive (but not 
necessarily in-range) is the J estimator |8|. For any e > 0, 
the estimator satisfying = mm{{l + e) \ ^ (S), Xu{S)} 
is in-range, is bounded, if (|9]l holds, and is competitive, if dHJ 
holds. 

5. ORDER-BASED OPTIMALITY 

We identify conditions on / and -< under which a 
optimal estimator exists and specify this estimator as a so- 
lution of a set of equations. Our derivations of ^+-optimal 
estimators follow the intuition to require the estimate on an 
outcome S to be i;-optimal with respect to the ^-minimal 
consistent vector: 

V5 - S{p, v), f{S) = X{p, mm{V*{S)) . (36) 

When ^ is a total order and V is finite, mm^{V*{S) is 
unique and ( |36] | is well defined. Moreover, as long as / 
has a nonnegative unbiased estimator, a solution (|36] | always 
exists and is ^+-optimal. We preview a simple construc- 
tion of the solution: Process vectors in increasing -< order, 
iteratively building a partially defined nonnegative estima- 
tor When processing v, the estimator is already defined for 
S{u, v) for u > py, for some p„ G (0, 1]. We extend it to 
the outcomes v) for u < py using the v-optimal ex- 
tension p'"'P'-"^'-'\u), where M — f{u, v)du (see The- 
orem lTTT l. 

We now formulate conditions that will allow us to estab- 
lish ^+ -optimality of a solution of (l36l l in more general set- 
tings. These conditions always hold when ^ is a total order 
and V is finite. Generally, 

min(F*(S')) = {ze V*{S)\^3w e V*{S), w z} 



is a set and (1361 1 is well defined when V5, this set is not empty 
and A(p, min^(F*(5))) is unique, that is, the value X{p, z) 
is the same for all -^-minimal vectors z G mm^{V* (S)). A 
sufficient condition for this is that 

VpVt; Vx G (0,/(p,t>)] Vz,w G imn(y*{p,v)), 

. r fiv,z)~x . f{r],w)-x 

mi = = mi = (37) 

ri<P p — ?/ n<p P — 1] 
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Example 4 L and U estimates for Example[3] 



We compute the L and U estimators for RGp+ for the sampling scheme and data in Example[3] For the two vectors (0.6, 0.2) 
and (0.6, 0), both the L and U estimates are when u > 0.6, this is necessary from unbiasedness and nonnegativity because 
for these outcomes 3v G V*{S),RGp+{v) = 0. Otherwise, the L estimate is RGp^-'(S') = {vi - v'2Y /v'2 - J^,\ "a'^^" dx, 
where v'2 = u when S = {1} and v'2 = V2 when 5* = {1, 2}. When p > 1, the U estimate is RGp'^''(5) = p{vi — u)^^^ 

when u £ {v2, vi] and when u < V2 < vi. When p < 1 the U estimate is v^^ when u G {v2,vi] and — ^ — 
when u <V2 < wi . 

The figure also include the t)-optimal estimates, discussed in Example[3j When V2 — 0, the U estimates are ^-optimal. The L 
estimate is not bounded when V2 —i) (but has bounded variance and is competitive). 



RGp+ p=0.5, PPS tau=1 , L,U,opt estimates RGp+ p=1 , PPS tau=1 , L,U,opt estimates RGp+ p=2, PPS tau=1 , L,U,opt estimates 




In this case, the respective Equation (l36l l on u G (0, p] 
are the same for all z G min^(F*(S')) and thus so are the 
estimate values f{u,z). 

We say that Z C is ^-bounded if 

\fv e Z3z e mm(Z), z<v (38) 

That is, for all z E Z, z ^-minimal or is preceded by 
some vector that is ^-minimal in Z. 

We say that an outcome S is -< -bounded if V*{S) is ^- 
bounded, that is, 

\lv eV*{S)3z C,mhi{V*{S)), z <v (39) 

When all outcomes S{u, v) are -< -bounded, we say that a 
set of vectors R represents v if any outcome consistent with 
V has a ^-minimal vector in R: 

Vu G (0,1], 3;z eR, ze imn(y*{u,v)) . 

We now show that we can obtain a ^+ -optimal estimator 
if every vector v has a set of finite size that represents it. 
Example |5] (Appendix) walks through a derivation of 
optimal estimators. 

Lemma 5.1. /// satisfies dJTl l, (3% and 

Vv, mm{|i?| I Vm 6 (0, l],3z € R, z € min V* (m, d)} < oo , 

then a -<'^ -optimal estimator exists and must be equivalent 
to a solution of ( I36I) . 

Proof. We provide an explicit construction of a ^ + -optimal 
estimator for /. 

Fixing V, we select a finite set of representatives. We 
can map the representatives (or a subset of them) to distinct 
subintervals covering (0, 1]. The subintervals have the form 



(ai,ai_i] where = a„ < • • • ai < ap = 1 such that a 
representative z that is minimal for (a^, a^-i] is not minimal 
for u < Gi. Such mapping can always be obtained since 
from ([T]), each vector is consistent with an open interval of 
the form (a, 1], and thus if -(-minimum at V*{u,v) (we 
must have u > a) it must be ^-minimum for V*{x, v) for 
X G (a, u]. Thus, the region on which z is in min^ V* {u, v) 
is open to the left. We can always choose a mapping such 
that the left boundary of this region corresponds to a^. 

Let 2;'^*' (i G [n]) be the representative mapped to out- 
comes S{u,v) where u G (ai,ai_i]. Since V*{u,v) is 
monotone non-decreasing with u, i < j implies that z^-^^ -< 
z'^^^ or that they are incomparable in the partial order. 

We construct a partially specified nonnegative estimator 
in steps, by solving (|36] | iteratively for the vectors z^^\ Ini- 
tially we invoke Theorem |2.1| to obtain estimate values for 
^(m, a;*-^-') u G (0, 1] that minimize the variance for z^-^K 
The result is a partially specified nonnegative estimator. In 
particular for v, the estimator is now specified for outcomes 
S{u,v) where u G (ai, 1]. Any modification of this esti- 
mator on a subinterval of (ai, 1] with positive measure will 
strictly increase the variance for 2;'^-* (or result in an esti- 
mator that can not be completed to a nonnegative unbiased 
one). 

After step i, we have a partially specified nonnegative es- 
timator that is specified for S{u, v) for u G (a^, 1]. The esti- 
mator is fully specified for 2;*^^^ j < i and is -<+-optimal on 
these vectors in the sense that any other partially specified 
nonnegative estimator that is fully specified for z^^^ j < i 
and has strictly lower variance on some z^^^ (j < i) must 
have strictly higher variance on some z'^'^'^ such that h < j. 

We now invoke Theorem 12.11 with respect to the vector 
The estimator is partially specified for S{u, 2;^*+^)) 
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on u > Gi and we obtain estimate values for the outcomes 
S{u, 2;*^'+^'') for u e (0, ai] that constitute a partially speci- 
fied nonnegative estimator with minimum variance for 2;^*+^) . 
Note again that this completion is unique (up to equivalence). 
This extension now defines S{u, v) for u G (fli+i, !]■ 

Lastly, note that we must have J{z^"'^) = f{v) because 
< f{v) implies that (|7]i is violated for v whereas 
the reverse inequality implies that ^ is violated for 2'"^. 
Since at step n the estimator is specified for all outcomes 
S'(w, and unbiased, it is unbiased for v. 

The estimator is invariant to the choice of the represen- 
tative sets i?„ for V ^ V and also remains the same if we 
restrict -< so that it includes only relations between v and 
Rv 

We so far showed that there is a unique, up to equivalence, 
partially specified nonnegative estimator that is optimal 
with respect to a vector v and all vectors it depends on. Con- 
sider now all outcomes S{u, v), for all u and v, arranged 
according to the containment order on V*{u, v) according 
to decreasing u values with branching points when V* {u, v) 
changes. If for two vectors v and z, the sets of outcomes 
S{u, v),u G (0, 1] and S{u, z), u G (0, 1] intersect, the in- 
tersection must be equal for u > p for some p < 1. In this 
case the estimator values computed with respect to either z 
or V would be identical for u G {p, 1]. Also note that par- 
tially specified nonnegative solutions on different branches 
are independent. Therefore, solutions with respect to differ- 
ent vectors v can be consistently combined to a fully speci- 
fied estimator. □ 

5.1 Continuous domains 

The assumptions of Lemma 15 . II mav break on continuous 
domains. Firstly, outcomes may not be -<-bounded and in 
particular, min^ {V* (5)) can be empty even when V* (S) is 
not, resulting in ( l36l ) not being well defined. Secondly, even 
if -< is a total order, minimum elements do not necessarily 
exist and thus ( [39l ) may not hold, and lastly, there may not 
be a finite set of representatives. To treat such domains, we 
utilize a notion of convergence with respect to ^ : 

We define the -<-lim of a function h on a set of vectors 
Z <ZV: 

-< -\im{h{-),Z) = x ^ (40) 
Vv e Z\fe> 3w ^v,yz ^w, \h{z) - x\ < e 

The -<-lim may not exist but is unique if it does. Note that 
when Z is finite or more generally, -<-bounded, and h{z) 
is unique for all z G min^ Z), then ^- \im{h{-) , Z) = 
/i(min^ Z). 

We define the ^-closure of z as the set containing z and 
all preceding vectors cl^{z) = G V\v ^ z}. 

We provide an alternative definition of the -<-lim using 
the notion of -<-closure. 

^ -lim(/i(-),Z) = a; (41) 

inf sup h{z) ~ sup inf h{z) — x 
'"^^ zec\^(v)nz vez zecl^Mnz 



We say that the lower bound function -<-converges on out- 
come S = S{p,v) if -<-lim(/(ry,-),l/*(S')) exists for all 
rj G (0, p). When this holds, the -< -lim of the pointwise 
optimal values ( fTSI l over consistent vectors V* {S) exists for 
all M = /(w, v)du < f{p, v). We use the notation 

X^{S,M) = -<-\iin{X{p,-,M),V*{S)) 

. ^-\im{f{f^,-),V*{S))~M 
— mi = . 

0<?;<p P — V 

When the partially specified estimator / is clear from con- 
text, we omit the parameter M and use the notation 

X^{S) = -<-\im{X{p,-),V*{S)) 

. ^-lim(/(77,.),F*(^))-/;/K«)du 
— mi . 

0<»;<p P ^ V 

We can finally propose a generalization of ( |36] |: 

V5, /(5) = A^(5) (42) 

which is well defined when the lower bound function ^- 
converges for all S: 

VS = S{p,v),\/r] < p, -<-lim(/(?7, ■),V*{S)) exists. 

(43) 

Using the definition dTTT i of -< -convergence and (|2|l we 
obtain that an estimator is equivalent to ( |42] | if and only if 

f f(u, v)du 

yvVp G (0, 1], lim ^ = X^{p, v) (44) 

v^p- P-Tl 

We show that equivalence to ( |42] | is necessary for 
optimality. To facilitate the proof, we express ^+-optimality 
in terms of restricted variance^ optimality: 

Lemma 5.2. An estimator is -optimal if and only if, 
for all V G V, it is variance"^ -optimalwith respect to cl^(v). 

Proof. If there is v such that / is not variances-optimal 
on cl^ (v), there is an alternative estimator with strictly lower 
variance on some z G cl^ (v) and at most the variance on all 
cl^(t)) \ {z}. Since cl^{v) contains all vectors that pre- 
cede z, the estimator / can not be -<+-optimal. To estab- 
lish the converse, assume an estimator / is variance^ op- 
timal on cl^{v) for all v. Consider z G V. Since / is 
variance+-optimal on cl^(2;), there is no alternative estima- 
tor with strictly lower variance on z and at most the variance 
of / on all preceding vectors. Since this holds for all z, we 
obtain that / is ^+ -optimal. □ 

Lemma 5.3. If f satisfies (jT) and (|43]l then f is -<+- 
optimal only if it satisfies ( 144b . 

Proof. Lemma lS!2l states that an estimator is ^+-optimal 
if and only if \/w G F it is variance+-optimal with respect 
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to cl^ (w). Applying Lemma ISTI the latter holds only if 

yv evype (0,1] (45) 

f{u, v)du 



lim 



p-v 



> 



< 



inf HPiZ) 
sup A(p, z) 



From definition, S{p, z) = S{p, v) for all vectors z G V*{p,v). 
Moreover, for z G V*{p,v) there is a nonempty interval 
{riz,p] such that Vii G {riz,p], V*{u,z = V*{u,v). There- 

f f(u z)du 

fore, for all z G V*{p,v), the Umits lim^^p- 
are the same. Therefore, ( |45] l 



G yvpG (0,1] 



lim 



□ 



p - 77 



> sup inf 

TOev*(p,D) z6cl^(iu)nv*(p,D) 

< inf sup 



(46) 

A(p,2;) 
Xip,z) 



We leave open the question of determining the most in- 
clusive conditions on / and -< under which a -<+ -optimum 
exists, and thus the solution of ( l42b is -<+ -optimal. We show 
that any solution of (l42l) is unbiased and nonnegative when 
/ has a nonnegative unbiased estimator. 

Lemma 5.4. When f and -< satisfy (|7]l and (I43l l, a solu- 
tion Z'^^-' of (I421 l is unbiased and nonnegative. 



Proof. From Lemma 13.11 since all values are in-range, 
the solution is unbiased and nonnegative. □ 

6. THE U ESTIMATOR 

The estimator f^-^^ satisfies H9h\ with equality. 

f{ri,z) — J^f{u,v)du 
yS{p,v), f{p,v) ^ sup inf = 

zeV'(S)0<v<p p-rj 

(47) 

The U estimator is not always variance+-optimal. We do 
show, however, that under a natural condition, it is order- 
based optimal with respect to an order that prioritizes vec- 
tors with higher / values. The condition states that for all 
S{p, v) and rj < p, the supremum of the lower bound func- 
tion f{rj,z) over z G V*{S) is attained (in the limiting 
sense) at vectors that maximize / on V*{S). Formally: 



Vij < p, lim sup fiv^z) 

x-^f(S)zeV'{S)\f{z)>x 



sup /(?7, z) , 
zev(S) 

(48) 

where f{S) = sup^gy-(s) /(^)- 

Lemma 6.1. If f satisfies (148 1 , then the U estimator is 
-optimal with respect to the order z ^ v f{z) > 

fiv). 

Proof. We can show that when (|48]i holds then (|42li is 
the same as (|42]|. □ 



The condition ( 1481 ) is satisfied by RGp and RGp+. In this 
case, the conditions of Lemma l5.1l are also satisfied and thus 
the U estimator is ^+ optimal. 

Conclusion 

We argued for the value of customizing estimators to pat- 
terns in the data while retaining peformance guarantees on 
a larger domain, taking an optimization approach to estima- 
tors derivation. We demonstrated our approach for coordi- 
nated shared-seed samples and believe we provided founda- 
tions for going further, designing automated tools to derive 
estimators according to specifications and for exploring cus- 
tomization for other sampling schemes, in particular, inde- 
pendent sampling. 
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Example 5 Walk-through derivation of -<+ -optimal estimators 



We derive ^+-optimal RGi-|- estimators over the discrete domain V = {0, 1, 2, 3}^. Assuming same sampling sclieme on both entries, there are 3 threshold 
values of interest, where TVii a [3] is such that entry of value i is sampled if and only if u < tt^. We have tti < 7r2 < Tra. 
The lower bounds RGi+('"-' are step functions with steps at u = tTj. The table shows RGi+''"-' (u) for all u and v. 





(1,0) 


(2, 1) 


(2,0) 


(3,2) 


(3, 1) 


(3,0) 


(0,7riJ 


1 


1 


2 


1 


2 


3 


(7ri,7r2] 





1 


1 


1 


2 


2 













1 


1 


1 























The i)-optimal estimate, rGj'J'(m) is the negated slope at u of the lower hull of RGi+''"'. The lower hull of each step function is piecewise linear with 
breakpoints at a subset of tt^, and thus, the D-optimal estimates are constant on each segment (TFj—i, TTj]. The table shows the estimates for all v and u. The 
notation \, refers to value in same column and row below and \y to value two rows below. 



(7ri,7r2] 

{T2,7r3] 



(1,0) (2,1) 



2-(7I 



(2,0) 



(3,2) 



(3, 1) 



(3,0) 



T2 

1 

T2 





iin{ 



7r2 7r2 





— TTl J 



1 



2-ij. 
2-4, 



3-i(jr3-7r2)--lj-(7i 



TTS 7r2 

^ min{^, 



3-4-(7r3-7r2) 2-4,(7r3-7r2) 



TTS 7r3— 7r2 



} 



minl-^ 



} 



The order (2, 1) -< (2, 0) and (3, 2) -< (3, 1) -< (3, 0) yields the L estimator, which is ^-optimal for (1, 0), (2, 1), and (3, 2). The order (2, 0) -< (2, 1) and 
(3, 0) < (3, 1) < (3, 2) yields the U estimator which is ^-optimal for (1, 0), (2, 0), and (3, 0). 

To specify an estimator, we need to specify it on all possible outcomes, where each distinct outcome is uniquely determined by the sets V* (S). The 8 possible 
outcomes (we exclude those consistent with vectors with RGi+(i)) = on which the estimate must be 0) are (1, 0), (2, < 1), (2, 1), (3, < 2), (3, 2), 
(3, < 1), (3,1), and (3,0). 

We show how we construct the ^+-optimal estimator for -< which prioritizes vectors with difference of 2: (3, 1) -< (3, 2) -< (3, 0) and (2, 0) < (2, 1). 
The estimator is ti-optimal for (3, 1), (2, 0), and (1, 0). This determines the estimates RG^^' on all outcomes consistent with these vectors: The value on 
outcome (1, 0) is RG*-'^'"" ((0, tti]), the values on outcomes (2, < 1) and (2, 0) are according to r'g''^'"' on (tti , 112] and (0, tti], respectively, and value 
on the outcomes (3, < 2), (3, < 1) and (3, 1) is according to RG^^'^' on (7r2, tts] and (tfi, 7r2]. These values are provided in the table above. The remaining 
outcomes are (3, 0), (3, 2), and (2, 1). We need to specify the estimator so that it is imbiased on these vectors, given the existing specification. We have 



l-(7r2-7ri)RG^:^\2,<l) 



A1> 



(3,0): 



(3,2): 



3 - (tts ■ 



■ 7Z2)RG[f(3, < 2) - (7r2 - 7ri)RG^+\3, < 1) 



2 - (tts - 7r2)RG^+H3, < 2) 



Observe that to apply these estimators, we do not have to precompute the estimator on all possible outcomes. An estimate only depends on values of the 
estimate on all less informative outcomes. In a discrete domain as in this example, it is the number of breakpoints larger than the seed u (which is at most the 
size of the domain). 
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